Skip to content

surface eval errors + disambiguate auth failures #minor#72

Draft
BradenBug wants to merge 5 commits into
mainfrom
bw/cbs-eval-error-surface
Draft

surface eval errors + disambiguate auth failures #minor#72
BradenBug wants to merge 5 commits into
mainfrom
bw/cbs-eval-error-surface

Conversation

@BradenBug

@BradenBug BradenBug commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Batch 2 (service side) of lab-deliverable-framework production readiness. Stops the deployed service hiding actionable, non-secret information from labs: real error messages on the legacy 500 path, and the six distinct auth-failure reasons surfaced as distinct, non-secret messages instead of one opaque 401.

Changes

  • Legacy 500 → real message. The legacy /evaluate-response/ + /final-score/ global handler now returns {"detail": "Evaluation failed", "errors": [str(exc)]} (mirroring the existing /v1 errors[]), with the traceback kept server-side only. Withheld from trial tenants (fail-closed: an unknown/unset tenant also withholds).
  • 401 disambiguation. resolve_descope_tenant now returns a typed AuthResult(tenant, failure); the middleware + websocket auth map each AuthFailure to a distinct status+detail — allowlist-miss → 403 ("contact the service operator to request access"), the rest → 401 with specific messages. Missing DESCOPE_PROJECT_ID becomes a startup hard-fail (a server-config bug, not a per-request 401).
  • Tightened AuthResult so a reject always carries a failure (the illegal "rejected but failure=None" state is unrepresentable; ok = failure is None); AuthResult/AuthFailure exported for service authors who override the auth hook.

Author self-review

  • No auth bypass: every reject path yields ok=False with a failure reason; _failure_response indexes an exhaustive table; no reject can return a 2xx (verified). The legacy boolean check_auth-override bridge still works.
  • Error disclosure fails closed — only a positively-resolved non-trial tenant gets errors[]; messages name no company or product (framework-clean).
  • Reviewed through the thermo-nuclear gate, which caught + drove a fix for the AuthResult illegal-state and a trial-tenant info-leak on the 500 path before this PR.

Related PRs

  • vals-ai/benchmark-orchestrator#7 — the orchestrator-side half of Batch 2 (eval-result backfill + provider-key fail-fast). Same batch.

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Refactor
  • Docs / config

Testing

  • Added/updated unit tests (226 passing; lint + typecheck clean)
  • Manually tested

Checklist

  • Self-reviewed the diff
  • No debug/dead code left in
  • Docs updated if needed

@assert-app

assert-app Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review on Assert →

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant